fix: unify ordering display with optimization path#20362
fix: unify ordering display with optimization path#20362adriangb wants to merge 7 commits intoapache:mainfrom
Conversation
|
@zhuqi-lucas could you review this change please? |
Previously, `get_projected_output_ordering` used `ordered_column_indices_from_projection` which was all-or-nothing: if any expression in the projection wasn't a simple Column, it returned None for the entire projection — even if the sort columns themselves were simple column refs. Replace it with `resolve_sort_column_projection` which only requires sort-column positions to resolve to simple Columns. Each ordering is now independently evaluated: orderings on simple column refs get validated with statistics even when other projection expressions are complex. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the independent display computation (get_projected_output_ordering) with orderings extracted from eq_properties().oeq_class(), so EXPLAIN output always matches what the optimizer actually sees. Previously, fmt_as() independently recomputed orderings via get_projected_output_ordering(), which validated post-projection and would drop valid orderings when any projection expression was complex (e.g. `a + 1`). Now both display and optimization use the same path: validate at table-schema level, then project through EquivalenceProperties::project(). - Delete get_projected_output_ordering and resolve_sort_column_projection - Update DataSource::fmt_as and DisplayAs::fmt_as to use eq_properties() - Add regression tests for complex projections with multi-file groups - Update SLT expectations for equivalence-aware ordering display Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The partition/file ordering diagrams from the deleted get_projected_output_ordering are useful context for understanding why we validate orderings against file statistics. Move them to validated_output_ordering where they belong. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c830e2c to
73af179
Compare
|
Hi @zhuqi-lucas quick bump 😄 |
There was a problem hiding this comment.
Pull request overview
This PR aligns DataSourceExec ordering display with the optimizer’s ordering/equivalence analysis so EXPLAIN reflects the same orderings the optimizer uses, avoiding mismatches for complex projections and equivalence-derived orderings.
Changes:
- Switch
FileScanConfigdisplay to useeq_properties().oeq_class()rather than recomputing projected orderings for formatting. - Remove the old display-only projected-ordering computation helpers.
- Update SLT expectations to reflect equivalence-aware ordering display (including multiple equivalent orderings) and add regression tests for the unified behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| datafusion/datasource/src/file_scan_config.rs | Unifies displayed orderings with optimizer orderings via oeq_class(), removes old display path helpers, and adds regression tests. |
| datafusion/sqllogictest/test_files/window.slt | Updates expected DataSourceExec ordering display (including multiple orderings). |
| datafusion/sqllogictest/test_files/union.slt | Updates ordering display expectations for union inputs. |
| datafusion/sqllogictest/test_files/topk.slt | Updates TopK-related ordering display expectations to show equivalence-derived orderings. |
| datafusion/sqllogictest/test_files/sort_pushdown.slt | Updates sort-pushdown ordering display to match equivalence-aware output ordering. |
| datafusion/sqllogictest/test_files/monotonic_projection_test.slt | Updates monotonic-projection ordering expectations to include additional equivalent orderings. |
| datafusion/sqllogictest/test_files/joins.slt | Updates join projection pushdown ordering display to include equivalent ordering on projected expression. |
| datafusion/sqllogictest/test_files/group_by.slt | Updates ordering display expectations in group-by plans to match equivalence-aware ordering sets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Hi @adriangb , sorry i am out for Chinese New Year Holiday, i will review this when i am back. |
Sorry to ping you on holiday then, no rush just thought it might have slipped through the cracks! |
Comprehensive review of the PR that fixes display/optimizer ordering disagreement in FileScanConfig by replacing the independent display path with eq_properties().oeq_class(). https://claude.ai/code/session_01HerFnFzGc7s4AQppknpup3
zhuqi-lucas
left a comment
There was a problem hiding this comment.
Thanks @adriangb for the work on unifying the display and optimization paths!
I noticed several changes in the EXPLAIN output that seem like regressions from a user-facing perspective:
- The order of multiple orderings changed
In many tests (e.g., group_by.slt, window.slt):
Before: output_orderings=[[a@1 ASC, b@2 ASC], [c@3 ASC]]
After: output_orderings=[[c@3 ASC], [a@1 ASC, b@2 ASC]]
The original ordering list followed the table's declared sort order, which was intuitive. Now it follows the internal projected_orderings() generation order from the dependency map, which is less predictable for users reading EXPLAIN output. I am not sure if it's right behaviour?
- Filter-constant columns are stripped
In sort_pushdown.slt:
Before: output_ordering=[timeframe@0 ASC NULLS LAST, period_end@1 ASC NULLS LAST]
After: output_ordering=[period_end@1 ASC NULLS LAST]
The physical file ordering is no longer visible in EXPLAIN. While the optimizer correctly knows timeframe is constant after filter pushdown, the user loses visibility into the actual file sort order.
I understand the goal is to make display match what the optimizer sees, but could we achieve the unification (e.g., removing the separate get_projected_output_ordering code path) while still preserving the original ordering list order and showing the full physical orderings? For example, using validated_output_ordering() with proper projection handling, without going through the full equivalence-class normalization for display purposes.
Summary
Unify the ordering display path with the optimization path so EXPLAIN output always matches what the optimizer sees.
FileScanConfigpreviously had two independent paths computing orderings:eq_properties()): validates orderings at table-schema level viavalidated_output_ordering(), then projects throughEquivalenceProperties::project().fmt_as()): independently recomputed viaget_projected_output_ordering(), which validated post-projection and could disagree with path 1.The display path dropped valid orderings when any projection expression was complex (e.g.
a + 1), even if the ordering column itself was a simple column reference. This PR replaces the display computation witheq_properties().oeq_class(), the same orderings the optimizer uses.Changes
get_projected_output_ordering()calls in bothDataSource::fmt_asandDisplayAs::fmt_aswithself.eq_properties().oeq_class()get_projected_output_orderingandresolve_sort_column_projection(no longer needed)test_display_ordering_with_complex_projection_multi_file— complex projections no longer drop valid orderingstest_display_ordering_dropped_for_overlapping_stats— overlapping file stats correctly suppress orderingstest_display_ordering_matches_eq_properties— display and optimization paths agreeCAST)Test plan
cargo test -p datafusion-datasource(100 tests pass)sort_pushdown,union,window,monotonic_projection_test,topk,group_by,joins🤖 Generated with Claude Code